Skip to content

Enhance approx_percentile with strict weight validation, improved error handling and IT coverage#17368

Merged
JackieTien97 merged 2 commits intoapache:masterfrom
FearfulTomcat27:fix/approx-percentile-weight
Mar 26, 2026
Merged

Enhance approx_percentile with strict weight validation, improved error handling and IT coverage#17368
JackieTien97 merged 2 commits intoapache:masterfrom
FearfulTomcat27:fix/approx-percentile-weight

Conversation

@FearfulTomcat27
Copy link
Contributor

This pull request enhances the robustness and correctness of the approx_percentile aggregation function, particularly when using weighted percentiles. It introduces stricter type checks and runtime validation for the weight parameter, improves error handling, and adds corresponding integration tests to ensure the expected behavior.

Validation and Error Handling Improvements:

  • Added runtime checks in ApproxPercentileWithWeightAccumulator to ensure that weights are not null and are greater than or equal to 1; throws a SemanticException if these conditions are not met. [1] [2] [3] [4] [5] [6] [7] [8]
  • Improved type validation for the approx_percentile function in TableMetadataImpl to ensure the percentage parameter is always decimal and the weight parameter (if present) is either INT32 or unknown; throws clear semantic exceptions for unsupported types.

Testing Enhancements:

  • Added integration tests in IoTDBTableAggregationIT.java to verify correct handling of null weights and invalid weight values in approx_percentile, including checks for proper error messages when weights are negative or of unsupported types. [1] [2]

Dependency and Import Updates:

  • Added import for SemanticException in ApproxPercentileWithWeightAccumulator.java to support new exception handling logic.

Copilot AI review requested due to automatic review settings March 26, 2026 09:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens correctness checks for the approx_percentile table aggregation, especially for the weighted (3-argument) form, by adding stricter argument type validation, runtime weight validation, and integration test coverage for the new behaviors.

Changes:

  • Enforce that approx_percentile’s percentage argument is a decimal type, and restrict weight (when provided) to INT32 (or UNKNOWN for NULL).
  • Add runtime validation in ApproxPercentileWithWeightAccumulator for invalid weights (e.g., < 1) and handle null weights.
  • Extend table-model integration tests to cover null/invalid weight scenarios and unsupported weight types.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/plan/relational/metadata/TableMetadataImpl.java Strengthens analyzer-time type validation for approx_percentile percentage/weight arguments.
iotdb-core/datanode/src/main/java/org/apache/iotdb/db/queryengine/execution/operator/source/relational/aggregation/ApproxPercentileWithWeightAccumulator.java Adds execution-time weight validation/handling for the non-grouped weighted accumulator.
integration-test/src/test/java/org/apache/iotdb/relational/it/query/recent/IoTDBTableAggregationIT.java Adds IT coverage for null weight result and invalid/unsupported weight failures.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +37 to +39
if (weightColumn.isNull(i)) {
continue;
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Null weights are currently silently skipped (continue), which contradicts the PR description that says weights are validated to be non-null. If the intended behavior is to reject NULL weights (at least when the value is non-null), this should throw a SemanticException instead of skipping; otherwise the PR description/tests should be aligned to explicitly define the semantics for NULL weights.

Copilot uses AI. Check for mistakes.
Comment on lines 28 to 60
@Override
public void addIntInput(Column[] arguments, AggregationMask mask) {
Column valueColumn = arguments[0];
Column weightColumn = arguments[1];

int positionCount = mask.getPositionCount();

if (mask.isSelectAll()) {
for (int i = 0; i < valueColumn.getPositionCount(); i++) {
if (weightColumn.isNull(i)) {
continue;
}
if (weightColumn.getInt(i) < 1) {
throw new SemanticException("weight must be >= 1, was " + weightColumn.getInt(i));
}
if (!valueColumn.isNull(i)) {
tDigest.add(valueColumn.getInt(i), weightColumn.getInt(i));
}
}
} else {
int[] selectedPositions = mask.getSelectedPositions();
int position;
for (int i = 0; i < positionCount; i++) {
position = selectedPositions[i];
if (weightColumn.isNull(position)) {
continue;
}
if (weightColumn.getInt(position) < 1) {
throw new SemanticException("weight must be >= 1, was " + weightColumn.getInt(position));
}
if (!valueColumn.isNull(position)) {
tDigest.add(valueColumn.getInt(position), weightColumn.getInt(position));
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the non-grouped accumulator was updated to validate/handle weights. Group-by queries use GroupedApproxPercentileWithWeightAccumulator (via AccumulatorFactory) which still calls weightColumn.getInt(...) without null/>=1 checks, so group-by approx_percentile(..., weight, ...) can still accept invalid weights or fail on NULL weights. The same validation logic should be applied to the grouped accumulator to keep behavior consistent.

Copilot uses AI. Check for mistakes.
"select approx_percentile(s1,null,0.5) from table1",
new String[] {"_col0"},
new String[] {"null,"},
DATABASE_NAME);
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new NULL-weight test only covers the non-grouped execution path. Since group-by uses a different (grouped) accumulator, it would be good to add an IT that exercises approx_percentile(..., null, ...) (and an invalid weight like -1) under GROUP BY to ensure the grouped implementation matches the intended semantics and doesn’t regress.

Suggested change
DATABASE_NAME);
DATABASE_NAME);
tableResultSetEqualTest(
"select 1 as g, approx_percentile(s1,null,0.5) from table1 group by 1",
new String[] {"g", "_col1"},
new String[] {"1,null,"},
DATABASE_NAME);

Copilot uses AI. Check for mistakes.
Comment on lines +37 to 45
if (weightColumn.isNull(i)) {
continue;
}
if (weightColumn.getInt(i) < 1) {
throw new SemanticException("weight must be >= 1, was " + weightColumn.getInt(i));
}
if (!valueColumn.isNull(i)) {
tDigest.add(valueColumn.getInt(i), weightColumn.getInt(i));
}
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weight validation happens before checking whether the corresponding value is null. This means rows with NULL values can still throw on an invalid/negative weight even though that row would otherwise be ignored by the aggregation. Consider only reading/validating the weight inside the !valueColumn.isNull(...) branch (and then validating null/<=0 weight there).

Copilot uses AI. Check for mistakes.
@JackieTien97 JackieTien97 merged commit 58ec38a into apache:master Mar 26, 2026
25 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants